Centroid-based Clustering

Source

Centroid-based methods group data points together based on the proximity of data points to the centroid (cluster center).

Measure proximity

Examples

K-Means Clustering

How it works

  1. Choose the number K of clusters
  2. Select at random K points, the centroids (not necessarily from your dataset)
  3. Assign each data point to the closest centroid => that forms K clusters
  4. Compute and place the new centroid of each cluster
  5. Reassign each data point to the new closest centroid
  6. Repeat until your model is ready (i.e. cost function reaches minimum
Important

K-Means Clustering vs. KNN (K-Nearest Neighbor)

K-means clustering KNN
k = number of clusters k = number of nearest neighbors
unsupervised learning supervised learning
clustering regression & classification
to optimize, use elbow method to optimize, use cross validation & confusion matrix

Tips during use

Random Initialisation trap

Choose the right number of clusters

Measure the separability between clusters: Silhouette Method